Search CORE

117 research outputs found

A formal semantics for control and data flow in the gannet service-based system-on-chip architecture

Author: Vanderbauwhede W.
Publication venue
Publication date: 14/07/2008
Field of study

There is a growing demand for solutions which allow the design of large and complex reconfigurable Systems-on- Chip (SoC) at high abstraction levels. The Gannet project proposes a functional programming approach for high-abstraction design of very large SoCs. Gannet is a distributed service-based SoC architecture, i.e. a network of services offered by hardware or software cores. The Gannet SoC is task-level reconfigurable: it performs tasks by executing functional task description programs using a demand-driven dataflow mechanism. The Gannet architecture combines the flexible connectivity offered by a Networkon- Chip with the functional language paradigm to create a fully concurrent distributed SoC with the option to completely separate data flows from control flows. This feature is essential to avoid a bottleneck at he controller for run-time control of multiple high-throughput data flows. In this paper we present the Gannet architecture and language and introduce an operational semantics to formally describe the mechanism to separate control and data flows

CiteSeerX

Enlighten

Code optimisation in a nested-sampling algorithm

Author: Ireland D.G.
Lewis S.J.
Vanderbauwhede W.
Publication venue: 'Elsevier BV'
Publication date: 11/06/2015
Field of study

The speed-up in program running time is investigated for problems of parameter estimation with Nested Sampling Monte Carlo methods. The example used in this study is to extract a polarization observable from event-by-event data from meson photoproduction reactions. Various implementations of the basic algorithm were compared, consisting of combinations of single threaded vs multi-threaded, and CPU vs GPU versions. These were implemented in OpenMP and OpenCL. For the application under study, and with the number of events as used in our work, we find that straightforward multi-threaded CPU OpenMP coding gives the best performance; for larger numbers of events, OpenCL on the CPU performs better. The study also shows that there is a “break-even” point of the number of events where the use of GPUs helps performance. GPUs are not found to be generally helpful for this problem, due to the data transfer times, which more than offset the improvement in computation time

Elsevier - Publisher Connector

Enlighten

Design and implementation of the Quarc network on-chip

Author: Maji P.P.
Moadeli M.
Vanderbauwhede W.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

Networks-on-Chip (NoC) have emerged as alternative to buses to provide a packet-switched communication medium for modular development of large Systems-on-Chip. However, to successfully replace its predecessor, the NoC has to be able to efficiently exchange all types of traffic including collective communications. The latter is especially important for e.g. cache updates in multicore systems. The Quarc NoC architecture has been introduced as a Networks-on-Chip which is highly efficient in exchanging all types of traffic including broadcast and multicast. In this paper we present the hardware implementation of the switch architecture and the network adapter (transceiver) of the Quarc NoC. Moreover, the paper presents an analysis and comparison of the cost and performance between the Quarc and the Spidergon NoCs implemented in Verilog targeting the Xilinx Virtex FPGA family. We demonstrate a dramatic improvement in performance over the Spidergon especially for broadcast traffic, at no additional hardware cost

Crossref

Enlighten

Quarc: a novel network-on-chip architecture

Author: Moadeli M.
Shahrabi A.
Vanderbauwhede W.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

This paper introduces the Quarc NoC, a novel NoC architecture inspired by the Spidergon NoC. The Quarc scheme significantly outperforms the Spidergon NoC through balancing the traffic which is the result of the modifications applied to the topology and the routing elements.The proposed architecture is highly efficient in performing collective communication operations including broadcast and multicast. We present the topology, routing discipline and switch architecture for the Quarc NoC and demonstrate the performance with the results obtained from discrete event simulations

Crossref

Enlighten

ResearchOnline@GCU

FPGA-accelerated information retrieval: high-efficiency document filtering

Author: Azzopardi L.
Moadeli M.
Vanderbauwhede W.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/08/2009
Field of study

Power consumption in data centres is a growing issue as the cost of the power for computation and cooling has become dominant. An emerging challenge is the development of ldquoenvironmentally friendlyrdquo systems. In this paper we present a novel application of FPGAs for the acceleration of information retrieval algorithms, specifically, filtering streams/collections of documents against topic profiles. Our results show that FPGA acceleration can result in speed-ups of up to a factor 20 for large profiles

Enlighten

Throughput analysis for a high-performance FPGA-accelerated real-time search application

Author: Chalamalasetti S.R.
Margala M.
Vanderbauwhede W.
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2012
Field of study

We propose an FPGA design for the relevancy computation part of a high-throughput real-time search application. The application matches terms in a stream of documents against a static profile, held in off-chip memory. We present a mathematical analysis of the throughput of the application and apply it to the problem of scaling the Bloom filter used to discard nonmatches

Crossref

Directory of Open Access Journals

Enlighten

A performance model of communication in the quarc NoC

Author: Moadeli M.
Shahrabi A.
Vanderbauwhede W.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

Networks on-chip (NoC) emerged as a promising communication medium for future MPSoC development. To serve this purpose, the NoCs have to be able to efficiently exchange all types of traffic including the collective communications at a reasonable cost. The Quarc NoC is introduced as a NOC which is highly efficient in performing collective communication operations such as broadcast and multicast. This paper presents an introduction to the Quarc scheme and an analytical model to compute the average message latency in the architecture. To validate the model we compare the model latency prediction against the results obtained from discrete-event simulations

Crossref

Enlighten

ResearchOnline@GCU

Development of Bayesian analysis program for extraction of polarisation observables at CLAS

Author: Ireland D.
Lewis S.
Vanderbauwhede W.
Publication venue: 'IOP Publishing'
Publication date: 11/06/2014
Field of study

At the mass scale of a proton, the strong force is not well understood. Various quark models exist, but it is important to determine which quark model(s) are most accurate. Experimentally, finding resonances predicted by some models and not others would give valuable insight into this fundamental interaction. Several labs around the world use photoproduction experiments to find these missing resonances. The aim of this work is to develop a robust Bayesian data analysis program for extracting polarisation observables from pseudoscalar meson photoproduction experiments using CLAS at Jefferson Lab. This method, known as nested sampling, has been compared to traditional methods and has incorporated data parallelisation and GPU programming. It involves an event-by-event likelihood function, which has no associated loss of information from histogram binning, and results can be easily constrained to the physical region. One of the most important advantages of the nested sampling approach is that data from different experiments can be combined and analysed simultaneously. Results on both simulated and previously analysed experimental data for the K+Λ channel will be discussed

Enlighten

A C++-embedded Domain-Specific Language for programming the MORA soft processor array

Author: Chalamalasetti S.R.
Margala M.
Purohit S.
Vanderbauwhede W.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

MORA is a novel platform for high-level FPGA programming of streaming vector and matrix operations, aimed at multimedia applications. It consists of soft array of pipelined low-complexity SIMD processors-in-memory (PIM). We present a Domain-Specific Language (DSL) for high-level programming of the MORA soft processor array. The DSL is embedded in C++, providing designers with a familiar language framework and the ability to compile designs using a standard compiler for functional testing before generating the FPGA bitstream using the MORA toolchain. The paper discusses the MORA-C++ DSL and the compilation route into the assembly for the MORA machine and provides examples to illustrate the programming model and performance

Enlighten